test: improve CLI test determinism and remove redundant test logic#1123
test: improve CLI test determinism and remove redundant test logic#1123cv merged 7 commits intoNVIDIA:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughType-only updates to Vitest Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
nemoclaw/src/blueprint/runner.test.ts (1)
577-641:⚠️ Potential issue | 🟠 MajorPlease restore regression coverage for
apply --planrejection.
mainno longer tests the unsupported--planpath, but runtime still rejects it inactionApply. This leaves CLI parse/dispatch behavior unguarded.Proposed test addition
describe("main (CLI)", () => { @@ it("parses apply with --profile and --endpoint-url", async () => { await main(["apply", "--profile", "default", "--endpoint-url", "https://override.test/v1"]); expect(mockedValidateEndpoint).toHaveBeenCalledWith("https://override.test/v1"); expect(stdoutText()).toContain("PROGRESS:100:Apply complete"); }); + + it("rejects apply when --plan is provided (not yet implemented)", async () => { + await expect( + main(["apply", "--profile", "default", "--plan", "/tmp/plan.json"]), + ).rejects.toThrow(/--plan is not yet implemented/); + });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemoclaw/src/blueprint/runner.test.ts` around lines 577 - 641, Add a test in the existing "main (CLI)" suite that exercises the unsupported apply --plan path: call main with arguments like ["apply","--plan","some-plan.json"] (after the existing beforeEach setup) and assert it rejects with an error containing "--plan" (or the exact rejection text emitted by actionApply); this restores regression coverage for the main -> actionApply dispatch path and ensures CLI parsing still rejects the --plan option at runtime.test/cli.test.js (1)
16-27:⚠️ Potential issue | 🔴 CriticalCritical bug:
spawnSyncis misconfigured — tests will fail withTypeError: r.out.includes is not a function.The current implementation has multiple issues:
spawnSyncdoes not throw exceptions — unlikeexecSync, it always returns a result object with anerrorproperty. Thetry-catchblock will never catch non-zero exits; the returnedoutis always the full result object{error, status, stdout, stderr, ...}, which has no.includes()method.Tests call
.includes()on an object — every test assertion liker.out.includes("Getting Started")will fail at runtime withTypeError: r.out.includes is not a function.Missing
shell: true— without it,spawnSynctreats the string as a literal executable name (looking for a file namednode "${CLI}" ${args}), resulting in ENOENT instead of executing the shell command.To fix, use:
Corrected implementation
function runWithEnv(args, env = {}, timeout = 10000) { - try { - const out = spawnSync(`node "${CLI}" ${args}`, { - encoding: "utf-8", - timeout, - env: { ...process.env, HOME: "/tmp/nemoclaw-cli-test-" + Date.now(), ...env }, - }); - return { code: 0, out }; - } catch (err) { - return { code: err.status, out: (err.stdout || "") + (err.stderr || "") }; - } + const result = spawnSync(`node "${CLI}" ${args}`, { + shell: true, + encoding: "utf-8", + timeout, + env: { ...process.env, HOME: "/tmp/nemoclaw-cli-test-" + Date.now(), ...env }, + }); + const out = (result.stdout || "") + (result.stderr || ""); + return { code: result.status ?? 1, out }; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/cli.test.js` around lines 16 - 27, The runWithEnv function misuses spawnSync: it never throws, returns a result object (so tests calling r.out.includes fail), and the command string needs shell: true; fix runWithEnv by calling spawnSync with shell: true (or pass command and args as an array), then read the returned result.stdout/stderr (convert to string) and result.status/result.error to determine exit code; return { code: <numeric status or error.status>, out: <stdout + stderr as string> } so callers can safely call r.out.includes; update references in runWithEnv to use the result object fields instead of assuming spawnSync throws.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@nemoclaw/src/blueprint/runner.test.ts`:
- Around line 577-641: Add a test in the existing "main (CLI)" suite that
exercises the unsupported apply --plan path: call main with arguments like
["apply","--plan","some-plan.json"] (after the existing beforeEach setup) and
assert it rejects with an error containing "--plan" (or the exact rejection text
emitted by actionApply); this restores regression coverage for the main ->
actionApply dispatch path and ensures CLI parsing still rejects the --plan
option at runtime.
In `@test/cli.test.js`:
- Around line 16-27: The runWithEnv function misuses spawnSync: it never throws,
returns a result object (so tests calling r.out.includes fail), and the command
string needs shell: true; fix runWithEnv by calling spawnSync with shell: true
(or pass command and args as an array), then read the returned
result.stdout/stderr (convert to string) and result.status/result.error to
determine exit code; return { code: <numeric status or error.status>, out:
<stdout + stderr as string> } so callers can safely call r.out.includes; update
references in runWithEnv to use the result object fields instead of assuming
spawnSync throws.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 10c32d50-fb8f-495a-8091-6c998082c50e
📒 Files selected for processing (5)
nemoclaw/src/blueprint/runner.test.tsnemoclaw/src/blueprint/snapshot.test.tsnemoclaw/src/blueprint/state.test.tstest/cli.test.jstest/uninstall.test.js
e965901 to
93ab0bb
Compare
cv
left a comment
There was a problem hiding this comment.
Thanks — the determinism goal here makes sense, but I think this needs a bit more work before merge.
Two blockers from the current diff:
-
test/cli.test.js: theexecSync->spawnSyncswap is not equivalent as written.spawnSync(node "${CLI}" ${args}, ...)will try to execute a binary with that full string as the executable name unlessshell: trueis set, so this should hitENOENT. Also,spawnSyncreturns{ status, stdout, stderr, error }and does not throw on non-zero exit, so the helper now returns{ code: 0, out: resultObj }on success instead of a string, and the existing error path no longer matchesexecSyncsemantics. I think this is the likely cause of the failingtest-unitjob. If we wantspawnSynchere, I would switch tospawnSync("node", [CLI, ...args], ...)and rebuild the helper aroundstatus/stdout/stderr/error. -
nemoclaw/src/blueprint/runner.test.ts: I don’t think the--plantest is redundant yet.runner.tson currentmainstill explicitly throws--plan is not yet implemented...inactionApply(), so removing this test drops coverage for behavior that still exists in production code.
Optional follow-up: the bash -lc -> bash -c direction in test/uninstall.test.js seems reasonable, but the file is not Prettier-clean right now, which may explain the red lint job. Also, for the HOME cases, setting HOME via env is safer than embedding HOME="..." source ... inside the command string.
Happy to re-review once those are addressed.
07e6b03 to
8324977
Compare
|
I’ve reverted the execSync → spawnSync change in test/cli.test.js. The previous swap wasn’t equivalent (as you pointed out: ENOENT risk + different return/error semantics), and keeping execSync preserves the current behavior and test expectations. Also re-added the --plan test to retain coverage for the existing behavior in runner.ts. For test/uninstall.test.js, I switched to bash -c and moved HOME into env for more deterministic behavior. I’ll make sure the file is Prettier-clean as well. Happy to revisit a proper spawnSync refactor separately if that’s something we want to pursue. |
90d9218 to
1cc0663
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/uninstall.test.js (1)
98-122: Clean up temp directories consistently in helper tests.These three tests create temp dirs but don’t remove them, unlike the
--yestest. Please wrap each intry/finally(or extract a small temp-dir helper) to avoid/tmpbuildup in repeated local/CI runs.♻️ Suggested pattern
it("removes the user-local nemoclaw shim", () => { const tmp = fs.mkdtempSync( path.join(os.tmpdir(), "nemoclaw-uninstall-shim-"), ); - const shimDir = path.join(tmp, ".local", "bin"); - const shimPath = path.join(shimDir, "nemoclaw"); - const targetPath = path.join(tmp, "prefix", "bin", "nemoclaw"); - - fs.mkdirSync(shimDir, { recursive: true }); - fs.mkdirSync(path.dirname(targetPath), { recursive: true }); - fs.writeFileSync(targetPath, "#!/usr/bin/env bash\n", { mode: 0o755 }); - fs.symlinkSync(targetPath, shimPath); - - const result = spawnSync( - "bash", - ["-c", `source "${UNINSTALL_SCRIPT}"; remove_nemoclaw_cli`], - { - cwd: path.join(import.meta.dirname, ".."), - encoding: "utf-8", - env: createFakeNpmEnv(tmp), - }, - ); - - expect(result.status).toBe(0); - expect(fs.existsSync(shimPath)).toBe(false); + try { + const shimDir = path.join(tmp, ".local", "bin"); + const shimPath = path.join(shimDir, "nemoclaw"); + const targetPath = path.join(tmp, "prefix", "bin", "nemoclaw"); + + fs.mkdirSync(shimDir, { recursive: true }); + fs.mkdirSync(path.dirname(targetPath), { recursive: true }); + fs.writeFileSync(targetPath, "#!/usr/bin/env bash\n", { mode: 0o755 }); + fs.symlinkSync(targetPath, shimPath); + + const result = spawnSync( + "bash", + ["-c", `source "${UNINSTALL_SCRIPT}"; remove_nemoclaw_cli`], + { + cwd: path.join(import.meta.dirname, ".."), + encoding: "utf-8", + env: createFakeNpmEnv(tmp), + }, + ); + + expect(result.status).toBe(0); + expect(fs.existsSync(shimPath)).toBe(false); + } finally { + fs.rmSync(tmp, { recursive: true, force: true }); + } });Also applies to: 125-149, 152-174
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/uninstall.test.js` around lines 98 - 122, The tests creating temporary directories (variables tmp, shimDir, shimPath, targetPath) and invoking spawnSync to run remove_nemoclaw_cli should ensure the temp dir is removed after each test; wrap the setup + test execution in a try/finally (or use a small temp-dir helper) and call fs.rmSync(tmp, { recursive: true, force: true }) in the finally block so the temp directory is always cleaned even on failures; apply the same pattern to the other similar tests that create temp dirs (the blocks around the other spawnSync calls using createFakeNpmEnv and UNINSTALL_SCRIPT).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/uninstall.test.js`:
- Around line 98-122: The tests creating temporary directories (variables tmp,
shimDir, shimPath, targetPath) and invoking spawnSync to run remove_nemoclaw_cli
should ensure the temp dir is removed after each test; wrap the setup + test
execution in a try/finally (or use a small temp-dir helper) and call
fs.rmSync(tmp, { recursive: true, force: true }) in the finally block so the
temp directory is always cleaned even on failures; apply the same pattern to the
other similar tests that create temp dirs (the blocks around the other spawnSync
calls using createFakeNpmEnv and UNINSTALL_SCRIPT).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4f69b1b4-a9aa-4c7d-8f0b-1980f8407c4e
📒 Files selected for processing (2)
nemoclaw/src/blueprint/snapshot.test.tstest/uninstall.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
- nemoclaw/src/blueprint/snapshot.test.ts
|
Workflows should be good to run, if all looks good. I've signed the commit using a GPG key, resolved linting issues and added a signature at the bottom of the PR to ensure ownership. Let me know if you have any questions/concerns. |
|
@cv This looks good to merge. |
10615a5 to
753e559
Compare
…1123) ## Summary Improves test determinism, consistency, and reliability across CLI, uninstall, and blueprint test suites by standardizing shell invocation, tightening execution patterns, and removing redundant or outdated test code. --- ## Related Issue Fixes #977 (part 1) --- ## Changes - Normalize shell invocation: - Replace `bash -lc` with `bash -c` in uninstall tests to avoid shell initialization side effects - Improve CLI test stability: - Increase timeouts for long-running commands - Standardize usage of `runWithEnv(..., timeout)` - Remove redundant / outdated test code: - Clean up unused or deprecated test logic in `runner.test.ts` - Improve test consistency: - Align execution patterns across CLI and uninstall tests - Preserve security coverage: - Maintain regression protections (e.g., path validation and credential handling) --- ## Verification - `npm test` passes locally - `npx prek run --all-files` passes in CI - No changes to CLI behavior or runtime logic - Existing security and regression tests continue to pass --- ## Rationale Some tests relied on shell initialization behavior (`bash -lc`) and inconsistent execution patterns, leading to flakiness and non-deterministic outcomes. These updates: - eliminate shell-dependent variability - standardize execution across test suites - improve reliability without impacting functionality Additionally, minor cleanup removes redundant or outdated test code to improve maintainability. --- ## Risk Assessment **Low risk** - Changes are limited to test code and execution behavior - No production code paths modified - Security and regression coverage preserved **Rollback** - Fully reversible by reverting test changes --- ## Type of Change - [x] Test / infrastructure improvement (no behavioral change) - [x] Code cleanup / maintenance --- ## Testing - [x] `npm test` passes - [x] `npx prek run --all-files` passes (CI) --- ## Checklist ### General - [x] Contributing guide followed ### Code Changes - [x] Formatters applied - [x] No user-facing behavior changes - [x] No secrets committed --- ## Summary by CodeRabbit (updated) * **Tests** * Improved CLI and uninstall test determinism by standardizing shell invocation * Increased timeouts to reduce flakiness in long-running test cases * Removed redundant or outdated test logic for improved maintainability <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Improved TypeScript typing inside test mocks for safer compilation. * Added a shared snapshot constant and strengthened a null-check assertion. * Simplified test environment handling by explicitly setting HOME and adjusting shell invocation semantics. * Removed redundant inline comment assertions and tightened output checks. * Minor formatting and clarity tweaks across test suites for easier maintenance. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Krish Sapru <[email protected]> --------- Co-authored-by: Carlos Villela <[email protected]>
Summary
Improves test determinism, consistency, and reliability across CLI, uninstall, and blueprint test suites by standardizing shell invocation, tightening execution patterns, and removing redundant or outdated test code.
Related Issue
Fixes #977 (part 1)
Changes
bash -lcwithbash -cin uninstall tests to avoid shell initialization side effectsrunWithEnv(..., timeout)runner.test.tsVerification
npm testpasses locallynpx prek run --all-filespasses in CIRationale
Some tests relied on shell initialization behavior (
bash -lc) and inconsistent execution patterns, leading to flakiness and non-deterministic outcomes.These updates:
Additionally, minor cleanup removes redundant or outdated test code to improve maintainability.
Risk Assessment
Low risk
Rollback
Type of Change
Testing
npm testpassesnpx prek run --all-filespasses (CI)Checklist
General
Code Changes
Summary by CodeRabbit (updated)
Summary by CodeRabbit
Signed-off-by: Krish Sapru [email protected]